23 research outputs found

    Detecting Shifts in Public Opinion:A Big Data Study of Global News Content

    Get PDF

    Women are seen more than heard in online newspapers

    Get PDF
    Feminist news media researchers have long contended that masculine news values shape journalists’ quotidian decisions about what is newsworthy. As a result, it is argued, topics and issues traditionally regarded as primarily of interest and relevance to women are routinely marginalised in the news, while men’s views and voices are given privileged space. When women do show up in the news, it is often as “eye candy,” thus reinforcing women’s value as sources of visual pleasure rather than residing in the content of their views. To date, evidence to support such claims has tended to be based on small-scale, manual analyses of news content. In this article, we report on findings from our large-scale, data-driven study of gender representation in online English language news media. We analysed both words and images so as to give a broader picture of how gender is represented in online news. The corpus of news content examined consists of 2,353,652 articles collected over a period of six months from more than 950 different news outlets. From this initial dataset, we extracted 2,171,239 references to named persons and 1,376,824 images resolving the gender of names and faces using automated computational methods. We found that males were represented more often than females in both images and text, but in proportions that changed across topics, news outlets and mode. Moreover, the proportion of females was consistently higher in images than in text, for virtually all topics and news outlets; women were more likely to be represented visually than they were mentioned as a news actor or source. Our large-scale, data-driven analysis offers important empirical evidence of macroscopic patterns in news content concerning the way men and women are represented

    Content Analysis of 150 Years of British Periodicals

    Get PDF
    Previous studies have shown that it is possible to detect macroscopic patterns of cultural change over periods of centuries by analyzing large textual time series, specifically digitized books. This method promises to empower scholars with a quantitative and data-driven tool to study culture and society, but its power has been limited by the use of data from books and simple analytics based essentially on word counts. This study addresses these problems by assembling a vast corpus of regional newspapers from the United Kingdom, incorporating very fine-grained geographical and temporal information that is not available for books. The corpus spans 150 years and is formed by millions of articles, representing 14% of all British regional outlets of the period. Simple content analysis of this corpus allowed us to detect specific events, like wars, epidemics, coronations, or conclaves, with high accuracy, whereas the use of more refined techniques from artificial intelligence enabled us to move beyond counting words by detecting references to named entities. These techniques allowed us to observe both a systematic underrepresentation and a steady increase of women in the news during the 20th century and the change of geographic focus for various concepts. We also estimate the dates when electricity overtook steam and trains overtook horses as a means of transportation, both around the year 1900, along with observing other cultural transitions. We believe that these data-driven approaches can complement the traditional method of close reading in detecting trends of continuity and change in historical corpora

    Automated Analysis of Narrative Content for Digital Humanities

    No full text
    story grammar, triplets, semantic graphs, computational social science Abstract ďż˝ We present a methodology for large scale quantitative narrative analysis (QNA) of text data, which includes various recent ideas from text mining and pattern analysis in order to solve a problem arising in digital humanities and social sciences. The key idea is to automatically transform the corpus into a network, by extracting the key actors and objects of the narration, linking them to form a network, and then analyzing this network to extract information about those actors. These actors can be characterized by: studying their position in the overall network of actors and actions; generating scatter plots describing the subject/object bias of each actor; and investigating the types of actions each actor is most associated with. The software pipeline is demonstrated on text obtained from three story books from the Gutenberg Project. Our analysis reveals that our approach correctly identifies the most central actors in a given narrative. We also find that the hero of a narrative always has the highest degree in a network. They are most often the subjects of actions, but not the ones with the highest subject bias score. Our methodology is very scalable, and addresses specific research needs that are currently very labour intensive in social sciences and digital humanities. 1

    Scalable Preference Learning from Data Streams

    No full text

    ElectionWatch: Detecting Patterns in News Coverage of US Elections

    No full text
    We present a web tool that allows users to explore news stories concerning the 2012 US Presidential Elections via an interactive interface. The tool is based on concepts of “narrative analysis”, where the key actors of a narration are identified, along with their relations, in what are sometimes called “semantic triplets ” (one example of a triplet of this kind is “Romney Criticised Obama”). The network of actors and their relations can be mined for insights about the structure of the narration, including the identification of the key players, of the network of political support of each of them, a representation of the similarity of their political positions, and other information concerning their role in the media narration of events. The interactive interface allows the users to retrieve news report supporting the relations of interest.
    corecore